Caddy HTTPS Servers
What
The entry point for all requests that are enventually managed by farfalla is a load balanced fleet of Caddy servers. These servers are hosted in AWS managed via Laravel Forge.
This first public layer takes care of automatically issuing and renewing certificates for both our own wildcard subdomains like alephdigital.publica.la and third party custom domains like digital.revisbarcelona.com.
Request Lifecycle
This section explains the complete lifecycle of a request in our multi-tenant application architecture, covering all the key services involved.
Here is a simplified overview of the request flow:
Key Parts of the System
- User: Located in Uruguay for this example.
- User's Device: A browser-enabled device.
- DNS: Resolves domain names to IP addresses.
- AWS Global Accelerator: Routes requests to the nearest AWS region. It also allows us to have a single domain as an entry point to the infrastructure, regardless of where the request is being made from. This configuration is done entirely inside AWS console. The Global Accelerator ensures that the user is connected to the server with lowest latency.
- AWS EC2: Hosts the Caddy server.
- Caddy: Acts as a reverse proxy and handles TLS management.
- ZeroSSL / Let's Encrypt: Provides HTTPS certificates.
- DynamoDB: Centralized storage for certificates.
- farfalla-https-guard: Microservice for validating domain and tenant information.
- AWS API Gateway 2.0 (HTTP APIs): Fronts AWS Lambda.
- AWS Lambda: Executes farfalla's logic.
- farfalla: Monolith service managing tenant-specific logic and core product features.
- PHP and Laravel: Language and framework behind farfalla.
Request Lifecycle Flow
This is an example using the URL https://digital.revisbarcelona.com/library.
- Initial Request from User
The user in Uruguay wants to load
https://digital.revisbarcelona.com/library. - DNS Lookup
The user's device performs a DNS lookup for
digital.revisbarcelona.com, which resolves as:digital.revisbarcelona.com→barcelona-wfsrt-59-ytqs.app.publica.la(CNAME).- It's our customer's responsibility to setup a DNS record of type CNAME to point
digital.revisbarcelona.comto the domainbarcelona-wfsrt-59-ytqs.app.publica.la
- It's our customer's responsibility to setup a DNS record of type CNAME to point
barcelona-wfsrt-59-ytqs.app.publica.la→ad83420ef3101bf80.awsglobalaccelerator.com(CNAME).- It's our responsibility to setup this DNS record of type CNAME.
- We manage it in Cloudflare with this 3 CNAME records:
- *.app.publica.la -> ad83420ef3101bf80.awsglobalaccelerator.com
- app.publica.la -> ad83420ef3101bf80.awsglobalaccelerator.com
- *.publica.la -> ad83420ef3101bf80.awsglobalaccelerator.com
ad83420ef3101bf80.awsglobalaccelerator.comhas two A records:76.223.34.22and13.248.160.216.- This A records are a responsibility of AWS. We consider this IPs as dynamic and do not depend on those specific IPs.
- Global Accelerator
The device attempts to load the content from one of these IPs.
- AWS Global Accelerator receives the request and checks its origin.
- Global Accelerator routes the request to the nearest region, which is
sa-east-1.
- Request Handling by EC2 and Caddy
- The EC2 server in
sa-east-1receives the request. - Caddy, running in that EC2 server, inspects the domain
digital.revisbarcelona.com, determines it's a custom domain, and executes the custom domain handler. - Caddy checks if it already has a valid certificate for the domain in its local cache. If not, it checks the centralized storage in DynamoDB.
- If no certificate is found, Caddy calls the
on_demand_tls.askendpoint to verify whether it should generate a certificate.
- The EC2 server in
- Domain Validation and Certificate Issuance
- The
on_demand_tls.askendpoint is managed by farfalla-https-guard, hosted via Laravel Vapor and AWS Lambda. - Caddy receives a 200 response, confirming certificate generation is allowed.
- Caddy creates an atomic lock in DynamoDB (
LOCK-issue_cert_digital.revisbarcelona.com) to ensure no duplicate certificate issuance. - Caddy contacts ZeroSSL via API to generate a 30-day HTTPS certificate.
- ZeroSSL requires domain validation using the HTTP_CSR_HASH challenge, which Caddy successfully completes.
- ZeroSSL generates the certificate, which Caddy stores in DynamoDB:
certificates/zerossl/digital.revisbarcelona.com/digital.revisbarcelona.com.jsoncertificate metadatacertificates/zerossl/digital.revisbarcelona.com/digital.revisbarcelona.com.keyprivate keycertificates/zerossl/digital.revisbarcelona.com/digital.revisbarcelona.com.crtcertificate file
- The
- Certificate Distribution and Cache Management
- Other Caddy instances across the fleet can now retrieve the certificate from DynamoDB.
- Caddy removes the atomic lock from DynamoDB and caches the certificate locally for future use.
- Reverse Proxy to farfalla
- Caddy forwards the request via reverse proxy to
https://farfalla-entry-point.publica.la. This specific endpoint is proxied through Cloudflare before reaching its origin (an AWS API Gateway V2 endpoint:d-uynhasnc45.execute-api.us-east-1.amazonaws.com). - This roundtrip through Cloudflare allows us to leverage its security features, including custom rate limiting, Web Application Firewall (WAF) rules, and automated DDoS mitigation, adding an extra layer of protection before requests hit our core application infrastructure.
- This endpoint handles both wildcard subdomains like
alephdigital.publica.laand custom domains likedigital.revisbarcelona.com. - Caddy adds the
X-Forwarded-Hostheader to indicate the original domain (digital.revisbarcelona.com) to the upstream service.
- Caddy forwards the request via reverse proxy to
- farfalla Request Handling
- AWS API Gateway 2.0 receives the request and forwards it to AWS Lambda.
- AWS Lambda executes farfalla's logic, ensuring a warm execution environment is available.
- farfalla's
TenantServiceProviderchecks theX-Forwarded-Hostheader and determines that the request is fromdigital.revisbarcelona.com.
- Tenant Resolution
- farfalla checks whether
digital.revisbarcelona.comis a valid tenantsubdomainorfinal_domain. - It confirms the tenant ID (
tenant_id=2) and initializes the app for that tenant using theCurrentTenantservice. - This setup enables tenant-specific logic like the global
tenant()helper.
- farfalla checks whether
- Response Generation and Delivery
- farfalla processes the
/libraryroute and generates the HTML response. - The response is passed back to AWS Lambda, which hands it off to AWS API Gateway 2.0.
- AWS API Gateway returns the response to Caddy, which forwards it to AWS Global Accelerator.
- AWS Global Accelerator sends the final response to the user's device.
- farfalla processes the
- Rendering
- The user's device renders the page at
https://digital.revisbarcelona.com/library.
- The user's device renders the page at
Current servers and load balancing setup
We have separate fleets dedicated to staging and production.
| IP | Name | Location |
|---|---|---|
| 34.229.139.178 | custom-domains-prod-us-02 | us-east-1 (USA, N. Virginia) |
| 15.228.13.208 | custom-domains-prod-br-02 | sa-east-1 (South America, São Paulo) |
| 52.30.112.138 | custom-domains-prod-eu-01 | eu-west-1 (Europe, Ireland) |
| - | - | - |
| 18.209.57.166 | custom-domains-staging-us-01 | us-east-1 (USA, N. Virginia) |
| 18.231.143.167 | custom-domains-staging-br-02 | sa-east-1 (South America, São Paulo) |
At the moment we have the following architecture for both Staging and Production environment
Staging
`
Production
`
Caddy config
Automatic and On demand HTTPS
We use Caddy to manage almost all HTTPS certificates, except those managed by Cloudflare or AWS.
Specifically, these types of certificates:
- First level wildcards of
*.publica.lafor production subdomains we create automatically for each new tenantalephdigital.publica.lafgilio.publica.la
- Second level wildcards of
*.app.publica.lafor production subdomains we create automatically for each new tenant - Custom domains for tenants that decide to setup their own via a CNAME.
digital.revisbarcelona.comkiosco.latercera.com
- First level wildcards of
*.publicala.mefor production subdomains we create automatically for each new tenantreader-qa-staging.publicala.me
- Second level wildcards of
*.staging-farfalla.publica.lafor production subdomains we create automatically for each new tenantdemoreaderqastaging.staging-farfalla.publica.la
Caddy is able to handle all of this automatically, more info in their docs here and here.
Wildcards
For wildcard subdomains we use Let's Encrypt free certificates, we don't use it for everyting because it has a very low rate limt.
We use Let's Encrypt via it's standard ACME protocol.
It uses the DNS challenge, so we include the dns.providers.cloudflare module and provide API keys so that Caddy can automatically manage the TXT DNS records during the challenge.
Custom domains
For custom domains we use a paid ZeroSSL account.
We use ZeroSSL via it's proprietary REST API, that Caddy supports natively.
It uses the HTTP_CSR_HASH challenge, which does not requiere a custom module.
Trusted Proxies
Within the (defaultSiteConfig) reverse_proxy configuration block, we utilize the trusted_proxies directive. This directive is essential for ensuring Caddy correctly identifies the original client IP address when requests pass through intermediaries like Cloudflare.
We configure trusted_proxies with the official list of Cloudflare's IP ranges. Maintaining an accurate list ensures that headers like X-Forwarded-For are processed reliably, which is crucial for logging, rate limiting, and security features.
The IP ranges are sourced directly from Cloudflare:
Misc
- Caddy ads a header called
X-Caddy-Idwith Forge's server ID to both response (to the user) and request (to farfalla) headers.
Monitoring and Maintenance
We can monitor all the servers and certificate types renewal from this dashboard https://ohdear.app/status-page/https-servers-and-certificates
Misc Caddy and servers
Use this command to switch to the root user and get access to Caddy's logs
sudo su - root
Use this command to tail Caddy's logs
journalctl -u caddy.service -b -f -n 10
Use this command to list Caddy's custom modules, such as DynamoDB storage
caddy list-modules --skip-standard
DynamoDB
You can run all these commands from AWS CloudShell.
COUNT LIKE %.publica.la% AND NOT LIKE %.app.publica.la%
aws dynamodb scan \
--table-name caddy_ssl_certificates \
--filter-expression "contains(PrimaryKey, :suffix) AND NOT contains(PrimaryKey, :exclude_suffix)" \
--expression-attribute-values '{":suffix": {"S": ".publica.la"}, ":exclude_suffix": {"S": ".app.publica.la"}}' \
--select "COUNT"
COUNT ocsp records
aws dynamodb scan \
--table-name caddy_ssl_certificates \
--filter-expression "contains(PrimaryKey, :suffix)" \
--expression-attribute-values '{":suffix": {"S": "ocsp"}}' \
--select "COUNT"
# -> 1053
COUNT .app.publica.la records
aws dynamodb scan \
--table-name caddy_ssl_certificates \
--filter-expression "contains(PrimaryKey, :suffix)" \
--expression-attribute-values '{":suffix": {"S": ".app.publica.la"}}' \
--select "COUNT"
# -> 3319
COUNT .staging-farfalla.publica.la records
aws dynamodb scan \
--table-name caddy_ssl_certificates \
--filter-expression "contains(PrimaryKey, :suffix)" \
--expression-attribute-values '{":suffix": {"S": ".staging-farfalla.publica.la"}}' \
--select "COUNT"
# -> 3319
Recipes
Recipes are small Bash scripts that we use to run tasks across many servers.
We store recipes in our Forge Account, more on their docs.
Receipes are named using this pattern: {create/update date} --- {target environment} --- {purpose}
These are the recipes we currently have:
545512024.09.05 GENERAL Upgrade custom Caddy build340952023.03.11 STAGING Setup new Caddy server from scratch616072024.06.06 STAGING Upgrade custom Caddy build and update config633462024.09.05 STAGING Download manual HTTPS certificate 20240905_wilcard-cert_.staging-farfalla.publica.la419832024.09.05 STAGING Update Caddy config340962023.03.11 PRODUCTION Setup new Caddy server from scratch616092024.06.06 PRODUCTION Upgrade custom Caddy build and update config633472024.09.05 PRODUCTION Download manual HTTPS certificate 20240905_wilcard-cert_.publica.la419842024.09.05 PRODUCTION Update Caddy config
* The number at the begining is the recipe ID in Forge.
Recipes are constantly updated each time we need to use them, because most of the times we're making at least a small change. This is why we consider Forge's version of the receipes the "source of truth", the recipes you see here are only a reference.
Creating a new server
To set up a new caddy server follow the next steps:
1. Create and configure the server using Forge
- Login to Forge with your credentials
- Go to Servers Page and press the button CREATE CIRCLE SERVER
- Select the circle publica.la and credential Staging or production depending on the case.
- Create a server with the following characteristics:
- Type: Load Balancer
- Name: Use the following naming convention
custom-domains-{staging or prod}-{location}-{number}(refer to other previously created servers) - Region: Select the region of your choise
- Server Size: Select the size of your choise
- VPC: Select Create New
- VPC Name: Give a meaninful name to the new VPC
- Post-Provision Recipe: Use the recipe according to your needs.
- Make sure the option Add Server's SSH Key To Source Control Providers is checked.
- Use the recipe called "Setup new Caddy server from scratch".
2. Connect the server to AWS Global Accelerator
After you have successfully created the new server, you need to add it to the AWS Global accelerator (load balancer). Following next steps:
- Enter the Global Accelerator Page in AWS Console
- Select CustomDomainsProduction
- You'll find a listener for each port
443and80 - Select a listener and press button Add enpoint group
- In region info select the region of the new server
- Expand Configure health checks
- Set Health check port to
80 - Set Health check protocol to
HTTP - Set Health check path to
/health - Set Health check interval to
10 - Set Threshold count to
2 - Click Next and add the caddy server as endpoint.
- Save your changes
Repeate steps 4.1 to 4.2 for each listener 443 and 80
Make sure to add the IP of the new instance to Farfalla.
Be mindful, if you resize or modify the IP of one of the instance you'll need to update the provious file in farfalla
Upgrading Caddy build
1. Get custom build
If you want to updagrade Caddy to a new version, but not it's config, follow the next steps:
- Visit Caddy Download page
- Select platform Linux amd64
- Select module caddy.storage.dynamodb
- Select module dns.providers.cloudflare
- Press button Download, wait for the binary to be built and downloaded
2. Upload to S3
Once you've downloaded the binary caddy_linux_amd64_custom, upload it to our caddy store repository in our s3 account: https://caddy-store.s3.amazonaws.com/.
The bucket is in the production account, the one with Account ID 375481448855.
Remember to also make the file publicly accesible.
Make sure to select the role publica.la - production to find the proper bucket
3. Upgrade servers using recipe
Now that you have uploaded the new version to the bucket, update the recipe "GENERAL Upgrade custom Caddy build" to poing to the new build and run it in the intended servers.
HTTPS Guard - Manually test
How to validate a customer's domain?
Production
We have to execute the following URL:
https://farfalla-https-guard.publica.la/api/v1/caddy-check-BYZJVBNM8WUVXRDZ?domain=domain_client
Replacing domain_client with the domain you want to verify.
This test can yield two results:
-
Status 200, with a message saying Domain Authorized.
-
Status 503. if the entered domain does not exist.
Staging
Replacing domain_client with the domain you want to verify.
This test can yield two results:
-
Status 200, with a message saying Domain Authorized.
-
Status 503. if the entered domain does not exist.
For more information visit the following link
Troubleshooting
This section provides guidance on common issues and how to resolve them. While our Caddy server setup is robust, occasional issues can arise.
Handling Unhealthy Servers
Our monitoring service, OhDear, checks the health of each Caddy server by sending a GET request to its public IP address on the /health path (e.g., http://18.209.57.166/health). If a server fails this health check, OhDear sends an alert through Squadcast. These occurrences are rare.
If you receive an alert for an unhealthy server, follow these steps to investigate and resolve the issue:
- Access AWS Console: Log in to the appropriate AWS account (e.g., staging or production).
- Navigate to Global Accelerator: Go to the Global Accelerator service page.
- Identify Unhealthy Listener: Check the listeners. An "Unhealthy endpoint" status indicates an issue. Note that since the same server handles traffic for both port 80 (HTTP) and port 443 (HTTPS), both listeners might show as unhealthy if a server is down.
- Inspect Endpoint Group: Navigate to the specific "Endpoint group" associated with the unhealthy listener. The health status should also be visible here.
- Locate the EC2 Instance: Identify the EC2 instance acting as the endpoint. The health status will be visible here as well. Note the EC2 Instance ID (e.g.,
i-04c795823b2c82eb7). - Reboot the Instance: If the server is confirmed to be unresponsive or unhealthy, select the EC2 instance and choose the "Reboot" option. In most cases, a reboot resolves the issue.
It is unusual for these servers to become unresponsive. If rebooting does not resolve the issue, further investigation into Caddy logs or server metrics may be necessary.
Maintenance Log
Thursday 2024.09.09
Responsible: Franco Gilio and Ignacio Milano Reason:
- Continuation from Thursday 2024.09.05 maintenance tasks
Action:
- Updated staging servers to use a wildcard certificate for
*.publicala.me,*.staging-farfalla.publica.la. Via Let's Encrypt - Updated production servers to use a wildcard certificate for
*.publica.la,*.app.publica.la. Via Let's Encrypt. - Re enable Let's Encrypt as a fallback, in cases ZeroSSL fails
- Use engineering+caddy@publica.la for Let's Encrypt emails
- Remove unused health_timeout directive -> https://arc.net/l/quote/iyjemdzg
- Add downstream and upsteam header to identify each server -> header_down +X-Caddy-Id "{{server_id}}"
- Remove redundant X-Forwarded-* headers
- Setup better monitoring in Oh Dear:
Server(s): All servers
Thursday 2024.09.05
Responsible: Franco Gilio and Ignacio Milano Reason:
- A bug in Caddy, ZeroSSL or both generated errors during the certificates renewal. Refs: 1, 2
- Caddy fallback to Let's Encrypt worked fine until it reached it's rate limit.
- At one point all certificates of
*.publica.lafailed renewal, custom domains where not affected because each has it's own rate limit with Let's Encrypt.
Action:
- Updated staging and production servers to use a wildcard certificate for
*.publicala.me,*.staging-farfalla.publica.laand*.publica.la. - Took it as an opportunity to work on this task.
- Took it as an opportunity to simplify the Caddyfiles, now reusing some portions.
- Took it as an opportunity to improve the documentation of how our complete HTTPs system works.
Server(s): All servers
Thursday 2024.06.06 - 02
Responsible: Franco Gilio Action:
- Update
/latest-issue-cover-imageroute to use micelios new handler that returns the image instead of a redirect
Server(s): All servers
Thursday 2024.06.06 - 01
Responsible: Franco Gilio Action:
- Upgrade to Caddy 2.8.4
- Update config to use ZeroSSL API instead of ACME endpoint
Server(s): All servers
Wednesday 2023.06.21
Responsible: Franco Gilio Action:
- Setup new server custom-domains-prod-eu-02, because AWS turned off the previous server
Server(s): All servers
Saturday 2023.03.11
Responsible: Franco Gilio Action:
- Upgrade Caddy to version v2.6.4 with DynamoDB storage package to 3.0.1
- Create fresh servers for all regions, based on Ubuntu 22
- Add Ireland as EU 01 region
Server(s): All servers
Friday 2022.12.16
Responsible: Franco Gilio
Action: Upgrade Caddy to version v2.6.2 and enable GZIP compression
Script: caddy_server_upgrade.sh
Server(s): All servers
Mon Jan 27, 2022
Responsible: Gonzalo Parra
Action: Upgrade Caddy to version v2.4.6
Script: caddy_server_upgrade.sh
Server(s): All servers
Mon Aug 2, 2021
Responsible: Franco Gilio & Ignacio Milano
Action: Add Let's Encrypt as issuer fallback in Caddy
Script: Add Let's Encrypt as issuer fallback in Caddy - PRODUCTION
Server(s): PRODUCTION Caddy servers
Mon Aug 2, 2021
Responsible: Franco Gilio & Ignacio Milano
Action: Add Let's Encrypt as issuer fallback in Caddy
Script: Add Let's Encrypt as issuer fallback in Caddy - STAGING
Server(s): STAGING Caddy servers
Thu Jul 29, 2021
Responsible: Gonzalo Parra
Action: Update dynamodb configuration in Caddyfile
Script: dynamoDB_fix.sh
Server(s): All servers
Mon Jul 5, 2021
Responsible: Gonzalo Parra
Action: Upgrade Caddy to version v2.4.2
Script: caddy_server_upgrade.sh
Server(s): All servers